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1 Introduction 

Inference for point processes on the real line has been dominated by a dynamic approach 
based on the stochastic intensity [H [161 El] which expresses the likelihood of a point 
at any given time conditional on the history of the process. Such an approach is quite 
natural in that it is the mathematical translation of the intuitive idea that more and 
more information becomes available as time passes. Furthermore, the approach allows 
the utilisation of powerful tools from martingale theory. In a statistical sense, since 
the stochastic intensity is closely related to the hazard rates of the distribution of the 
length of the inter-point intervals, a likelihood is immediately available |8|. 

The dynamic approach, however, does not seem capable of dealing with situations 
in which the flow of time is interrupted. In such cases, combined state estimation 
techniques are needed that are able to simultaneously carry out inference and to recon¬ 
struct the missing points. More specihcally, state estimation aims to hnd ‘the optimal 
reconstruction, realization by realization, of unobserved portions of a point process 
or of associated random variables or processes’ [HI p. 93]. Other examples include 
extrapolation and prediction [21], hltering [2B], cluster detection [H] or the estimation 
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of the driving measure of a Cox process [23] . 

The aim of this paper is to apply ideas from sequential point process [20l[2T], in par¬ 
ticular the sequential Papangelou conditional intensity which describes the probability 
of hnding a point at a particular time conditional on the remainder of the process. 
Thus, the concept is related to the stochastic intensity, except that the future is taken 
into account as well as the past. It is this last feature that allows the incorporation of 
missing data. Moreover, for hereditary point processes at least, they dehne a likelihood. 

The plan of this paper is as follows. In Section |2l we represent a point processes 
on the real line as a sequential point process on semi-cubes and recall the dehnition 
of the sequential Papangelou conditional intensity. Section [3] derives the marginal and 
conditional distributions for the situation where the point process is observed in disjoint 
intervals only. Important special cases are studied in detail: Markov point processes 
in Section 14.11 renewal processes in Section 14.21 and Cox processes in Section |5l In 
Section 16.11 we show that a substantial bias may be incurred when missing data is not 
accounted for properly. We present a Metropolis-Hastings algorithm for sampling from 
the conditional distribution of the missing data as well as a locally dehned birth-and- 
death process based on the sequential Papangelou conditional intensity in Section 16.21 
and the Appendix. These techniques are then used in Monte Carlo inference. Finally, 
in Section [71 a health care surveillance data set collected by Diggle and Hawtin is 
analysed to illustrate the approach and we conclude with a summary. 


2 Finite point process on the line 

In this section, we review basic concepts from the theory of chronologically ordered, 
simple point processes on an interval [0,T], T > 0. 

Realisations of a simple point process X consist of a hnite number of distinct 
points {ti,..., tn} C [0, T] for n G Nq. Since the points are naturally ordered, there is 
a unique correspondence between the set {ti,... ,tn} and the vector (ti,..., tn) where 
ti < t2 < ■ ■ ■ < tn illlZ]- The special case n = 0 corresponds to an empty realisation. 
Equipping i7„([0, T]) = {(ti,..., E [0, T]" : ti < • • • < tn}, n eN, with the product 
Borel (T-algebra, the distribution of a hnite sequential point process Y can be specihed 
by m- 

• a probability mass function n E Nq, for the number of points in [0,T]; 

• for each n E N for which > 0, a probability density function pn on i7„([0,T]) 
for the chronologically ordered vector of point locations given that there are n of 
them. 

Note that the Pn need not be symmetric. 
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The likelihood of hnding n G N points, one at each of the locations ti,... in 
that order, is expressed by the Janossy densities [8] 

• • • ) in') QnPnitli • • • ; tn) 


on Hn(T). For n = 0, jo = go- More compactly, one may specify a density / on 
A^'^([0, T]) = U 5 ^o-^n ([05 ^]) (equipped with the a-algebra generated by the Borel prod¬ 
uct cT-helds) with respect to the probability measure 



• • •) tn) dti ■ ■ ■ dtji 


which is related to the unnormalised Janossy densities as /(ti,..., = e^jn{ti,... ,tn) 

and /(0) = e^Qo- Note that z/[ot] is the distribution of a unit rate Poisson process on 
[0,T]. 

The sequential Papangelou conditional intensity [21] for inserting t at position k G 
{1,..., n -f 1} is dehned by 


• • •) tji) 


f {tly • • • ) tk—lj • • • ) tn) 
f{tu---,tn) 


( 1 ) 


whenever both tk-i < t < tk and /(ti, ■ ■ ■ ,tn) > 0; it is set to zero otherwise. Re¬ 
versely, provided /(•) is hereditary in the sense that f(ti,... ,tn) > 0 implies that 
/(si,..., Sm) > 0 for all subsequenses (si,..., Sm) of (ti,... ,tn), the density factorises 
as 

n 

/(tl, ...,tn) = /(0) Yl Hti\tl, • • • , ti-l). (2) 

i=l 

At this point, it is important to note that the term Papangelou conditional intensity 
is often abbreviated to plain ‘conditional intensity’. We choose not to do so to avoid 
confusion with another factorisation often referred to by this name. Indeed, by [51 
Prop. 7.2.Ill], the probability density function of Y can be factorised as 


/(fl, . . . ,tn) = 

i=l 


h*{t)dt 


where h* is a combination of the hazard functions of the conditional probability density 
functions 7^n{tn\ti, ■ ■ ■ ,tn-i) that govern the location of the n-th point conditional on 
the ‘past’ points ti,... ,tn-i- We shall refer to h* as the ‘hazard rate’. Note that one 
sometimes writes h*(ti) = h*(ti\ti, ... to stress the dependence on the history of 

Y up to but not including U. The two concepts are related, but not identical in general. 
For a rigorous treatment, the reader is referred to [I61E7[. 
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3 Partially observed point processes 

Consider a sequential point process that is observed in a broken interval [0, Ti] U [T 2 , T], 
In other words, any point falling in (Ti, T 2 ) is not observed. Thus, a realisation consists 
of two chronologically ordered sequences, say r = (ri,... ,rk), k G No, in [0,Ti] and 
s = (si,..., sz), / e No, in [T 2 , T], 

In order to carry out likelihood based inference or state estimation, we need, re¬ 
spectively, the marginal distribution of the observations and the conditional law of the 
missing points. 

Proposition 1 Let Y be a simple sequential point process on [0,T], T > 0, defined by 
pf, qf as in SectionlE 

(a) The marginal distribution of the sequential point process Z defined as the restriction 
ofY to [0,Ti] U [T 2 ,T] has Janossy density 



where n{-) denotes vector length (with the convention that the integral for n = 0 is 
equal to Pn{f)+n{s)(^^ )■ other words, Z has density /^(r, s) = s) 

with respect to i^[o,Ti]u[r 2 ,T]- 

(b) If r and s are feasible in the sense that their marginal Janossy density ^ is 
strictly positive, then the conditional distribution ofY on (Ti,T2) given r and s 
has Janossy density 



for n G N, qo{r,s) = s). Equivalently, a conditional density with 

respect to z/(Ti,t2) is /(ti, ... ,tn\r,s) = e^^~'^^jn{ti, ..., f„|r, s). 

Proof: Split [0,T] in three parts: [0,Ti], (Ti,T 2 ) and [T 2 ,T]. Then, 



on the disjoint union set Llj+k+i=nHk,j,iiTi, T 2 , T) can be described as follows. A subset 
B is measurable exactly if Br\Hk ji{Ti,T 2 ,T) is a Borel set for all k,j,l adding up to n 
|12] . Clearly this holds for all Borel sets in T]) and, reversely, any B measurable 

in the union cr-algebra is the disjoint union of Borel sets, hence a Borel set itself. 
Consequently, Y can also be split into three well-dehned, though possibly dependent, 
sequential point processes by restriction to [0,T], (Ti,T 2 ) and [T 2 ,T]. 
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For k,j,l G No, write -Ffcj,z for the event that j points fall in (Ti,T 2 ), k points in 
[0,Ti] and / further points in [T 2 ,T]. Then 



when k + j + I > Q and otherwise. 

(a) For n G No, the marginal probability of placing n points in [0,Ti] U [T 2 ,T] is 
the probability of the disjoint union \J{Fkj^i : k,j, I G No, k + I = n}, that is, 

CXD 

k,lGNo:k-\-l=n j=0 


Similarly, suppose that > 0 and condition on having n points in [0, Ti]U[T 2 , T], 
Then, for any measurable A C iV-f([0,Ti] U [T 2 ,T]), 

OO 

F{Z e A\n{Z) = n) = ^ ^ P({E G A} n i|n(E) = n) 

k,lGNo:k-\-l=n j=0 

OO 

k,leNo-k+l=n j=0 

Since F{{Z G A} fl can be written as 

qI+j+i / / / lA(r,s)p^+^+;(r,t,s)drdtds, 

JHk{[0,Ti]) JHji(n,T2)) JHi{[T2,T]) 

a particular conhguration consisting of r G Ffc([0,Ti]) and s G F;([T 2 ,T]) with 
k + I = n has (conditional) probability density 



with respect to Lebesgue measure on Ff„([0,Ti] U [T 2 ,T]). 

(b) Next, condition on r G Hk{[0, Ti]) and s G Hi{[T 2 , T]). To compute the probability 
of having n, n G No, points in between Ti and T 2 , we may restrict ourselves to 
Fk^n,i- If P(Ffc,n,z) = 0, this probability is zero. Thus, assume F{Fk,n,i) > 0 and, 
a fortiori, > 0. On Fk,n,i, the (joint) likelihood of the k + n + I points 

is q^_^_n+lPk+n+li')^ hence integration over Hn{(Ti,T 2 )) yields that the marginal 
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density of (r, s) with respect to Lebesgue measure on Hk{[0,Ti]) x Hi{[T 2 ,T]) is 
given by 

Qk+n+l / Pr+„+z(f,t,s)dt. 

JHn{{Ti,T2)) 

We conclude that the conditional probability of hnding n points in (Ti, T 2 ) given 
r and s elsewhere is 


^k+n+l Ih„{{Ti,T 2)) Pk+n+li^y 

?n(r,s) = y - - -p- r ,r - 

z2j=0 ^k+j+l JHj((Ti,T2)) Pk+j+li^^ 

The denominator is strictly positive by assumption. Moreover, a conditional 
density function for the locations of points in y fl (Ti, T 2 ) given there are n such 
points and given realisations r and s elsewhere exists and reads 

Pfc+^+;(r, • • • , s) 
fHn{{Ti,T2)) ^’fc+n+z(y ^5 

provided the denominator is strictly positive, which is always the case if gn(r, s) > 

0 . 


□ 


For hereditary sequential point processes, we have the following simplihcation of 
Proposition [T](b). 

Proposition 2 Let Y be a simple hereditary sequential point process with density 
with respect to z^[o,t]- -hef r G iffc([0,Ti]) and s G ih;([T 2 ,T]) be feasible in the sense 
that {r,s) > 0. Then the conditional distribution ofY on (Ti,T 2 ), 0 < Ti < T 2 < T, 
given r and s is hereditary with Papangelou conditional intensity 

An+i(t|ti,..., tn; r, s) = ti,..., s) (5) 

equal to the Papangelou conditional intensity of Y for all (ti,...,tn) G if„((Ti,T 2 )) 
and t G (fn, T 2 ] provided /^(r, ti,..., s) > 0. 


Proof: By the proof of Proposition [T] part (b). 


go(r,s) 


ql+iPl+iij, s) 

j^(r,s) 


Therefore q^ir^s) = 0 would imply that j|^;(r, s) and hence /^(r, s) would be zero. 
Since Y is hereditary, /^(r, ti,..., s) would be zero for all < • • • < tj and all 





7 


j G No- Consequently, j^(r, s) would be zero, a contradiction with the assumption. 
We conclude that qo{r,s) > 0. 

For n G N, consider the chronologically ordered vector {ti,...,tn) with compo¬ 
nents in (Ti,T 2 ) and assume that jniti, ■ ■ ■ s) > 0. By (jlj) and the assumption 
that j'^(r, s) > 0, ti,..., s) > 0. Since Y is hereditary, all subsequences 

also have strictly positive Janossy density, in particular those including both r and s. 
Another appeal to dH) implies that the conditional point process is hereditary too. 

By the general theory outlined in Section [2l the density factorises as 

n 

f{ti, .. . ,tn|r,s) = ... ,ti_i;r,s) 

i=l 


in terms of the Papangelou conditional intensity 


Ai(tj|ti,..., tj—i, r, s) 


__ r (r,ti,...,ti,s) 

/(ti,... ,ti_i|r,s) /^(r,fi,... ,ti_i,s) 


for /(fi,..., s) > 0 and zero otherwise. The last equality follows from Proposi¬ 
tion [1] part (b) and implies ([5]). □ 


The Papangelou conditional intensity, in contrast to the hazard rate which is a 
function of the history of the process, is able to take into account past as well as future 
points. Indeed, there does not seem to be an analogue of ([5]) in terms of h*. 

The theorems above cannot be simplihed without specihc model assumptions. In 
the next sections we therefore specialise to, respectively, Markov point processes, 
nearest-neighbour processes and Cox models. 


4 Markov point process 

In this section, we consider models in which the dependence of a sequential point 
process in (Ti,T 2 ) on its future and past is limited either to a border region [Ti — 
R,Ti\ U [T 2 ,T 2 -|- R] for some hxed interaction range R (Section 14.ip or by its two 
nearest neighbours, say in [0,Ti] and Si G [T 2 ,T] (Section l4.2p . 

4.1 Fixed range dependence 

A chronologically ordered sequential point process Y on [0,T] is said to be Markov at 
range i? > 0 if it is hereditary and and, for all (fi, ...,for which {ti, ... ,tn) > 0 
and for all M 7^ fj, i = 1, ... ,n, the Papangelou conditional intensity A^^_]^('u|ti ,... ,tn) 
depends only on u and the set of points < u that have temporal distance u — ti<R. 








In other words, the underlying unordered point process is Markov in the classical sense 
[26] . Therefore, by the Hammersley-Clifford theorem, the density of Y factorises as 

...,tn) = l{tl < ■■■ < tn} JJ^(y) 

for some non-negative integrable interaction function 'ip, ranging over the collection 
of unordered hnite subsets y of {fi,... ,f„} C [0,T], that vanishes except on cliques. 
More precisely, V’(y) = 1 unless it consists of pairwise i?-close points. To get rid of the 
indicator function, dehne ip{u,y) = tpiy U > max(y)} for u G [0,T] so that 

n 

=/^(0) JJ Yl 

i=l 

cf. [20]. As usual, for n = 0, (ti,, to) is interpreted as the empty sequence. 

Next, turn to (jS]) and observe that it reduces to 

An+i(t|ti,...,tn;r,s) = JJ V^({t}Uy) Yl V’({OUxUy) , 

y<Z{ti,...,tn} 07^xCrUs 

the product of all interaction functions that involve the point t. We conclude that the 
Markov property is preserved under conditioning but the clique interaction functions 

A(^,y|r,s) = V7’^(t,y) ?/;({t}UxUy) 

0 ^xCrUs 

for y C {ti,... ,tn} and t > tn may change and depend on points of r and s in 
[Ti — R,T 2 + i?]. In particular, since 0|r, s) need not be constant in t even when 
(p'^{t,y) is, non-homogeneity may be introduced by the conditioning. 

The special case of pairwise interaction processes of the form 

n 

=/^(0) Yl (6) 

i=l 

with (ti,...,tn) G Hn{[0,T]) is of special interest. For such models, the conditional 
distribution of Y on (Ti,T 2 ), 0 < Ti < T 2 < T, given r on [0,Ti] and s on [T 2 ,T] has 
hrst and second order interaction functions 

(y9(t,0|r,s) = </9^(t,0) Yl n 

xGr:t—x<R xGs:x—t<R 

Therefore the conditional point process is also Markov at range R with pairwise inter¬ 
actions only. The second order interaction function is unaffected by the conditioning. 
The hrst-order interaction function depends on points of r and s up to a range R from 
(Ti,T2). 
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4.2 Nearest neighbour dependence 

A renewal process on [0,T] is defined as follows [161 Chapter 8]. Starting at time 0, 
let Ui be a sequence of independent and identically distributed inter-arrival times with 
common probability distribution function We assume that is non-defective and 
absolutely continuous with density vr. Its hazard function is denoted by Set Tq = 0 
and Tj = Tj_i -|- Ui for i G N. Then those Tj, i > 1 falling in (0,T] form a simple 
sequential point process Y. For simplicity, we assume that the process starts at time 
zero, but other starting distributions may also be accommodated 

Due to the independence assumptions in the model, the hazard rate of Y is particu¬ 
larly appealing. Write Vt for the backwards recurrence time at t, that is, the difference 
between t and the last event falling before or at time t. Then, h*{t) = [16], so 

that the density with respect to z/[o,r] is given by 




W_h* {U) 

i=\ 


h*{t)dt 


e'^(l - F^{T - U)) JJ 7i{ti - ti_i) 

i=l 


(7) 


for (ti,... ,tn) ^ Hn{[0, T]) [SI Hn], under the conventions that an empty product is set 
to one and that to = 0- 

The sequential point process Y is not necessarily hereditary, for example when vr 
has small bounded support. Thus, we shall assume that vr > 0. In this case, can 
be factorised in terms of its Papangelou conditional intensity 

An+l(t|tl, ...,tn) =7r{t- tn) ^ (§) 

for tn < t < T. It depends on the conhguration F,... only through the point F 
that is closest to f, no matter how large the distance t — tn- Such models are known 
as nearest-neighbour Markov point processes [3] . Their density satishes a factorisation 
similar to (Ej) for fixed range Markov processes. Indeed, for n G N, 


..., F) = /^(0) Yl 0) n ---Un) 

i=l i=2 


and /^(0) = e^(l — Ft,{T)). The interaction functions are 


^p^{t,iD) 


7r(t)(l-F^(T-t)) 

1 - F^{T) 


and 


{U, {t}\ti, . . . Un) 


Tiju-t) [l-F^(T)] 
7r{u) [l-F^(T-f)] 
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for u G {ti,... ,tn} and t = max{ti : ti < u} with (t,y\ti, ... ,tn) = 1 otherwise. 
Note that, as the pairwise interaction function involves two consecutive neighbours, it 
depends on the whole conhguration in contrast to the models discussed in the previous 
subsection. 

Next, consider the conditional distribution of Y on (Ti,T 2 ) given r on [0,Ti] and 
0 7 ^ s on [T 2 ,T]. Formula (jS]) specialises as follows: 

\ t-i-l-i- 4- . ^ ~ ~ 1 |-j. ^ 4- ^ rji (p,\ 

"^71+1 '1^71)^'1^) / ^\tn ^ t ^ 2 J (9) 

^(51 - tn) 

with the convention that for n = 0 , fo = max(r) if r 7 ^ 0 and fo = 0 otherwise. 
Compared to (IH]), the survival probability 1 — F^(T — t) is replaced by 7 r(si — t). If 
s = 0 , (E]) reads 


Xn+i{t\ti ,... ,fn;r,0) 


7r(t-t„)(l -F^(T-t)) 


l{tn < t < T 2 } 


as in dH]), except for the fact that t is restricted to (Ti, T 2 ). We conclude that the condi¬ 
tional distribution is a nearest-neighbour Markov point process on (Ti,T 2 ). Assuming 
s 7 ^ 0 , for n G M, its density can be written as 


fu , 1 ^ . 7r(ti - max(r))7r(si - tn) A , 

/(ti,...,tn|r,s) = /( 0 |r,s)--y-_ _ 


i=2 


7 r(si — max(r)) 

n n 

/( 0 |r, s) Yl 0 |r, s) JJ {t*-i}|ti. 




,^n;r,s) 


2=1 


2=2 


with 


V7(t,0|r,s) = 


7i{t — max(r))7r(si — t) 


7 r(si — max(r)) 


under the convention max( 0 ) = 0 , and 




7i{u — t)7i{si — max(r)) 
7i{u — max(r))7r(si — t) 


for u G {ti,... ,tn} and t = max{tj : ti < u} with ip"^{t,y\ti, ■ ■ ■ ,^n) = 1 otherwise. 
For the special case s = 0, the ratio 7r(si — t)/7r(si — max(r)) should be replaced by 
(1 — iyr(T — t))/(l — Ft^{T — max(r)). In conclusion, the boundary conditions r and s 
affect the interaction functions only through the two nearest neighbours max(r) and 
min(s), provided they exist. 
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5 Cox processes 

A Cox process F is a two-stage stochastic process. More specifically, let A(t) be a 
random field on [0,T] and define the random measnre 'h by 


^(A) = [ A{t)dt 

J A 

for all Borel sets A C [0,T]. Then, conditionally on A = A, F is a chronologically 
ordered Poisson process on [0, T] with intensity fnnction A. Clearly, conditions must be 
imposed on A so that the integral representation of T is well-defined, for example that 
A almost surely has continuous realisations BE]. The density and Janossy measures 
of F are found by integrating out over the distribution of A, that is. 


it («i. •••.*») = E 




( 10 ) 


for alHi < • • • < 

Since the Papangelou conditional intensity typically is not known explicitly, we 
focus on Proposition [U Firstly, upon plugging in formula (1T0|) . 


.. ,tn\r,s) 


^2^ 

j^(r,s) 


Ea 


n 

e-fCM A(„)JjA(t,) 

MS(r,s) i=l 


with the marginal Janossy density of Z, the restriction of F to ([0, Ti] U [T 2 , T]), given 
by 


i^(F.s) = Ea 


expl-<I-((|0,T,|uK,T]))] n A(«) 

u€(r,s) 


since conditional on A, Z is a Poisson process. Finally, splitting exp [—\k([0, T])] = 
exp [—T((Ti, T 2 )) — T(([0,Ti] U [T 2 ,T]))], we conclude that, provided r and s are fea¬ 
sible in the sense that j^(r, s) > 0, the conditional distribution of F on (Ti,T 2 ), 
0 < Ti < T 2 < T, given r and s is a Cox process with its driving random field 
distributed as the 


A(m) exp 

wG(f,s) 


[0,T]\[Ti,T2] 


A(t)dt 


weighting of the distribution of A. 

As a clarifying example, consider the so-called compound Poisson process on [0,T] 
for which A is constant over [0,T], taking either of the values Ai 7 ^ A 2 with equal 
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probability. Then, writing N for the sum of the cardinalities of r and s and T for the 
length of [0,Ti] U [T 2 ,T], 


jn(tly • • • ) s) 


E.Ar^^exp 

-A,(T)-A,(T2-Ti) 

E.Afexp 

-A.(T)' 



In other words, the conditioning has the effect of changing the probability of A* from 
a half to 




Afe-^iT + Afe-^^t' 


through N, the number of observed points. A further example will be studied in detail 
in Section [71 


6 Simulation study: renewal processes 

In this section, we will consider parameter and state estimation for a well-known re¬ 
newal process model. It is shown that naive approaches may result in bias. Instead, we 
shall adapt techniques for edge correction in spatial data to sequential point processes 
on the line. 

Figured] shows a realisation of a temporal renewal process on [0,4] observed within 
[0,1] U [3,4] with Erlang inter-arrival probability density 

vr(x) = , ^ X > 0, 

(a - 1)! 


for A = 40 and a = 2. 

For renewal processes observed in an unbroken interval [0, T], T > 0, [161 Chapter 8] 
surveys two approaches to estimate the parameters of the inter-arrival distribution vr. 
The hrst method is to treat the fully observed inter-arrival times Lj, i = 1,..., N{T), 
as a random sample from vr and apply maximum likelihood or the method of moments 
to obtain parameter estimates. The second approach is to use the explicit represen¬ 
tation of the likelihood in terms of the hazard rate h* and apply maximum likelihood 
estimation directly. Note that the hrst approach applies equally to observations on 
broken intervals, but the second one does not. 

6.1 Inference based on fully observed intervals 

Denote the lengths of the fully observed inter-arrival intervals by /i,... ,/„. If these 
would constitute a valid random sample, for a hxed, the model would be an expo¬ 
nential family with sufficient statistic U. The maximum likelihood estimator would 
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Figure 1: Sample from a renewal process with Erlang(2) inter-arrival times and rate 
parameter A = 40 observed in [0,1] U [3,4], Time is plotted against index number. 

exist and be given by A/a = n/ k- The parameter a could be estimated by prohle 
likelihood. In our context, however, the sample size n = N{T) is random. Neverthe¬ 
less, [16] suggests to proceed as if the k were a random sample and claims this causes 
relatively little loss of information. 

For the realisation consisting of 40 points depicted in Figure H] consider the 38 
observable inter-arrival times. For this sample, A = 40.91, whilst the prohle likelihood 
method yields d = 2. To assess the bias and variance, we generated 100 data patterns 
on [0,4] for the parameter values A = 40 and a = 2. We iteratively excluded the 
middle of the left-most interval and estimated A. The results are summarised in the 
table below. 


observation interval 

mean 

variance 

[0,1]U[3,4] 

42.1 

25.2 

[0,0.25] U [0.75,1] 

47.4 

136.2 

[0,0.0625] U [0.1875,0.25] 

128.1 

1 X 10^ 


We conclude that the bias and variance increase as the observation window contains 
less inter-arrival intervals. The bias occurs as smaller intervals are more likely to fall 
in the observation window, a phenomenon known as length bias [I6] . For the smallest 
interval, the bias is severe. Indeed, of the samples on the union of two intervals of 
length 1/16 each, less than half had observable inter-arrival times at all. Moreover, 
the sub-interval lengths are only slightly larger than the expected inter-arrival length 
of 1/20. 
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6.2 Monte Carlo maximum likelihood with missing data 

The approach of Section lCTl does not make full use of the available data, as it completely 
ignores partially observed inter-arrival intervals. Here, we adapt the method of [13] for 
dealing with missing data to better account for such intervals. 

Again, suppose F is a renewal process on [0,T] that is observed on [0,Ti] U [T 2 ,T]. 
Write IT = Tig for the density of the inter-arrival time distribution. By part (b) of 
Theorem [H j^(r, s) is a normalising constant for the conditional Janossy density on 
(ri,T 2 ) given the conhguration elsewhere, say r on [0,Ti] and s on [T 2 ,T]. Hence, 
writing U for the vector of un-observed points in Y falling in (Ti, T 2 ), the log likelihood 
ratio with respect to a given reference parameter 6*0 is [131121] 



where fj is of the form ([7|). In general, there is no explicit expression, so we must rely 
on Monte Carlo approximation, that is, replace the expectation by an average over a 
sample from the conditional distribution to obtain 



( 11 ) 


Specihcally, the U* form a sample from the conditional distribution of U on (Ti,T 2 ) 
given Z = (r, s) on [0, Ti] U [T 2 , T] under the reference parameter 6q. By taking deriva¬ 
tives, we get the Monte Carlo score and Fisher information. Note that even for expo¬ 
nential families, in the missing data case there may not be a unique maximum likelihood 
estimator. 

To generate a sample U*, i = 1,..., iV, we use the Metropolis-Hastings approach P 
musiisD] and tune it to the present context. An alternative is to use birth-and-death 
processes [25], but note that special care is needed to avoid explosion while ensuring 
sufficient mixing at the same time. Further details are provided in the Appendix. 

Let us return to the model of Subsection 16.11 and the data plotted in Figure [H The 
parameter is the rate A, and, for 0 < H < • • • < < T = 4, 



Hence, the sufficient statistic consists of n, the number of points, and T — the 
backward recurrence time from T. Write r = (ri,... ,rk) for the points observed in 
[0,1] and s = (si, ... ,si) for those in [3,4]. For our data, r consists of 19 points, s of 


21 . 


To hnd a suitable reference parameter for the log likelihood ratio, we use the 
Newton-Raphson method [TH]. For the data of Figured] this method gives Aq = 40.531. 
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Figure 2: Monte Carlo log likelihood ratio for the data shown in Figure 1 based on 
10, 000 samples from a long run of the Metropolis-Hastings algorithm sub-sampled 
every thousand steps with Aq = 40.531. 

We compute the Monte Carlo log likelihood ratio Fjv(A) using N = 10,000 samples 
from a long Metropolis-Hastings run sub-sampled every thousand steps after a thou¬ 
sand steps burn-in. Note that it is sufficient to store the sufficient statistics only, in 
this case the length of the vectors U*. The function is shown in Figure O It has a 
unique maximum at A = 40.36; the Monte Carlo inverse Fisher information is 20.3. 

To assess the bias and variance, we considered a hundred data patterns as in Sec¬ 
tion 16.11 For each pattern, after ten Newton-Raphson steps from the true value 
Ao = 40.0, N = 1,000 Monte Carlo samples were obtained by sub-sampling in a 
long run every thousand steps after a burn-in of a thousand steps. The results are 
summarised in the table below. 


observation interval 

mean (A) 

variance 

[0,1]U[3,4] 

40.9 

22.8 

[0, 0.25] U [0.75,1] 

40.8 

76.7 

[0,0.0625] U [0.1875,0.25] 

40.8 

457.3 


One sees that compared to the naive approach, the bias is much reduced by correctly 
taking into account missing data and does not increase noticeably for smaller sub¬ 
intervals. The variance is also reduced but does increase when more data is missing. 
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Figure 3: Calls to NHS Direct from Hampshire in September-October 2001. The date 
of calls, in days starting January 1st, 2001, is plotted against index number. 

7 Application 

Figure [3] shows daily records of calls to NHS Direct, a phone service operated by 
the National Health Service in Britain which people could call 24 hours a day to get 
medical advice. The calls shown are those that reported acute gastroenteric complaints 
in the county of Hampshire (on the English south coast and including e.g. the towns of 
Winchester and Southampton). The full data were described and analysed in p iTOlITT] . 
A modified version provided by Professor Diggle and Dr Hawtin is available at 

WWW.lancaster.ac.uk/staff/diggle/pointpatternbook/datasets/AEGISS. 

The latter data contain 7167 records from the years 2001 and 2002. Figure E] plots the 
327 calls recorded during September and October 2001. A salient feature is that no 
calls were registered during the period September 13th-September 30th with recording 
resuming on October 1st, 2001. This gap is clearly visible in the hgure. 

As in [9], we assume that calls are made according to a log-Gaussian Cox process 
on [0,T] with T = 61 (time measured in days) so that in the notation of Section [5] 
A{t) = The integrable function po : [0,T] —)■ M+ accounts for temporal 

variations and S is an Ornstein-Uhlenbeck process, that is, a stationary, Gaussian, 
Markovian process with exponentially decaying covariance function 

Cov{S{t),S{t')) = 
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The mean is chosen so that the expectation of exp(S'(t)) = 1. By [H [2l |23], S has 
almost surely continuous realisations and the model is well-dehned. 

Since the data is discretised in daily counts, the intensity of counts during the Tth 
day may be approximated by i = 0,..., 60. As S' is an Ornstein-Uhlenbeck 

process, it satishes the Markov recursion 

S(0) = -y + crr(0) 

2 

S{i) = -y (l - e“^) + e“^S(i - 1 ) + (Tr(i) 

for independent zero-mean normally distributed random variables r(*) having variance 
1 — exp(—2/3) under the convention 1 for r(0). In terms of the r(i), 

S(^) = -y + cr^e-^(*-^)r(j). (12) 

j=0 

In words, S{i) is a sum of independent normally distributed components discounted 
by elapsed time. 

The goal of this section is two-fold. Firstly, we shall contrast moment based es¬ 
timation with and without taking into account the gap; then, we will consider state 
estimation. 


7.1 Parameter estimation 


Although in principle it is possible to carry out likelihood based inference, this would 
be computationally costly since the missing data consist not only of the missing counts 
in September 2001, but also of all the S{i) or, equivalently, the r(i), cf. Section El 
Hence [9] proposed to use moment methods. 

First consider /iq. Since the expectation of is one, /io(*) is the expected number 
of counts for day i. This observation led [9] to £t a Poisson log-linear regression model 
of the form 


log/io(i) = Sd(i) +ai cos 



-|- bi sin 



-|- 02 cos 



-|- 62 sin 



+ gi, 


i = 0,..., 60. This model takes into account the apparent increase of incidences in 
Spring, the overall increase in calls to NHS Direct over time and the weekend-effect 
through the d{i), but ignores over-dispersion due to the Gaussian field. Note also that 
the counts are conditionally rather than jointly independent. 

To assess the effect of the gap, we estimate the parameters based on two scenario’s 
as follows. A naive estimator treats the zero counts in September as if they were true 
observations. The result is the red line in Figure 01 A more sophisticated approach 
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Figure 4: Weekly averages of calls to NHS Direct during 2001-20002 plotted against 
the week index. Red line: fit by naive approach; Black: fit by approach taking into 
account the gap. 


disregards the time period from September 13-30, with the black line in Figure [Has a 
result. The trough due to the missing data is clearly visible in the picture and, due to 
the periodic nature of the htted model, also affects the behaviour a year later. 

Next, turn to estimation of cr^ and (3. Let W, i = 0,..., / = 729, denote the 
observed number of calls during day i. The analogue of the pair correlation function 
[7] for count data is (iVWi-j/)/(ho(*)/^o(* “ ^)) which has expectation exp at 

lag u ^ 0 regardless of i. Therefore, the minimum contrast method minimises 


E 

i/=i 


I 




-\ 2 


1-1^ + ^ —f, Mi)Mi - 


- exp 


(13) 


over (3 and Here, fiQ is an estimate of po, / = 729 and m is the number of lags 
considered. 

Minimising flT^ with po based on all data including the spurious zeroes using 
m = 14 lags, we obtain cx^ = 0.12 and /3 = 0.55. The £t is indicated in the left-most 
panel of Figure [5l Note that po is underestimated, hence the pair correlation function 
is overestimated. 

The zero counts between September 13th and September 30th, 2001, should not be 
taken into account. Thus, in IfT^ . we plug in the appropriate estimator po (on which 
the black line in Figure 0] is based) and for lag z/, consider only pairs (W, -Nj-i/) for 
which both i and i — z/ do not fall in the period September 13-30. Doing so, we obtain 
cr^ = 0.11 and (3 = 0.91. Thus, the value of is not much affected, but that of (3 is. 
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Figure 5: Minimum contrast fit for the naive approach (left) and when taking into 
account the missing data (right). The points are the mean of NiNi_y/— z/)) 
plotted as a function of the lag z/. 

The £t is indicated in the right-most panel of Figure [5l We conclude that correlations 
between counts separated by a week or more are small. 

7.2 State estimation 

Our hnal goal is to £11 the gap in Figure [3] by sampling from the conditional distribution 
given the data. Now, as the estimated pair correlation function shown in Figure [5] is 
close to one for lags of a week and more, one may restrict attention to the counts 
n{Ti — 6),..., n{Ti) and n{T 2 ), ..., n{T 2 + 6) either side of the gap. Moreover, for Cox 
processes, state estimation amounts to sampling from the driving random measure, cf. 
Section [5l which, by flT^ . is uniquely dehned by the T{i), i = 0,... ,T. Therefore, the 
log likelihood, up to a constant c{/3,a‘^, ^o,n{i)i), is given by 

T 

- - E 2(1 [nWS,W - A.We*-'®] . (14) 

i=l ^ ' iG{Ti- 6 ,...,ri,T 2 ,...,T 2 + 6 } 

Since c{/3,a‘^, cannot be evaluated exactly, we use a Metropolis-Hastings 

algorithm to sample from (flTD as in [5]. The proposal distribution is a multivariate 
normal one with independent components having variance h > 0 and mean h/2 times 
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Figure 6: State estimation for the data shown in Figure |3l The date of calls is plotted 
against index number. 


the gradient of ffTTl) . Thus, transitions are steered into a desirable direction. It is shown 
in [27] that the Markov chain is aperiodic and Lebesgue-irreducible, so that for almost 
all starting states, the chain converges to fll4p in total variation. 

For h = 0.5, we ran the chain for 10, 000 steps. Using the realisation thus obtained, 
shown in Figure jO] we sampled the counts and re-estimated the parameters by mini¬ 
mum contrast. The results are similar to those obtained using the more sophisticated 
approach in Section 17.11 Indeed, = 0.11 and (5 = 0.93, whereas the plot of jXo 
averaged over weeks is almost identical to the black line in Figure HI 

Next, we extended the run, sub-sampling every 1, 000 steps, to obtain 10, 000 real¬ 
isations of flTT|) and calculated the histogram of the total intensity in the gap period. 
It is shown in Figure [71 In conclusion, around 150 calls may have been missed. 

8 Summary 

In this paper, we considered temporal point processes observed in broken observation 
windows. We derived the marginal and conditional distributions in terms of the Janossy 
densities of the underlying sequential point process and studied Markovian and Cox 
model in more detail. We carried out a simulation study to assess the the length bias 
in renewal processes and analysed a real-life data set. The approach can easily be 
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Figure 7: Histogram of the conditional total intensity of calls during September 13-30, 
2001, given the observed calls plotted in Figure |3l 

extended to space-time by incorporating spatial marks. 
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Appendix: Markov chain Monte Carlo 

To simulate a point process whose distribution is defined by the Papangelou conditional 
intensity, we adapt two general strategies to our context. Throughout, let V be an 
hereditary sequential point process on [0,T] with density / with respect to z/[o,t] and 
Papangelou conditional intensity A*(-I-). 


Metropolis—Hastings sampling 


The Metropolis-Hastings method works by proposing an update according to a distri¬ 
bution that is convenient to sample from, and then to accept or reject this proposal 
with a probability that is chosen so as to make sure the detailed balance equations are 
satisfied [6]. 

The two generic types of proposals are births and deaths. More precisely, with 
probability 1/2 propose a birth, otherwise a death. In the first case, select a point u 
uniformly on [0,T] and insert it in its chronological position i. In case of a death, a 
point is chosen uniformly for deletion; if the sequence is empty, nothing happens. Then, 
if the current vector is (fi,..., f„), the proposal to add u is accepted with probability 


min 



Xi{u\ti, . . .,tn)T 
n + 1 




the proposal to delete the Tth point with probability 


• ^ 

T7—i-vr 

It is easily seen that /(•) is an invariant density. Moreover, if we start the chain 
in a sequence t for which /(t) > 0, the chain will almost surely never leave the set of 
states having positive density. 

In order to show that the Metropolis-Hastings chain n G Mq, converges to /(■) 
in total variation from any initial state having positive density, it is sufficient to assume 
stability, that is, that the Papangelou conditional intensity is uniformly bounded by 
some /5 > 0. The proof follows the same lines as in [TB] . 



Birth-and-death process sampling 

Since the Papangelou conditional intensity may fluctuate a lot, we define birth-and- 
death processes in terms of local bounds. More precisely, assume that 

Xi{u\ti ,..., tn) < g{u\ti, ...,tn)</3 (15) 

for some integrable function g that is constant on i = l,...,u -|- 1, say 

gi(ti,... ,tn), and some (3 > 0, with an appropriate convention for i = 1 and i = 
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n + 1. Then, the birth-and-death algorithm runs as follows. If the current state is 

t (ti ,..., tji'j , 

• compute the upper bound G(ti,... ,tn) to the total birth rate by 

n+l „ti >^+1 

G(fi, . . . , in') ^ ^ • • • 1 tn}du ^ ■ ■ ■ i tn) 

i=l JU-l i=i 


and take death rate D{ti,... ,tn) = u; 

• generate an exponentially distributed sojourn time with rate parameter G(t) + 
D(t)i 


• with probability 


^ La i generate a new point (‘birth’) as follows: 

G(t 


— sample an interval with probability 
uniformly in the chosen interval; 


{tj-tj-ijgi (t) 

G(t) 


and propose a new point u 


— accept the proposal with probability 


AiHt) . 
3i(t) ’ 


• with probability 


^(t) 

G(t)+D(f) ’ 


delete a uniformly chosen point. 


is 


Then the rate for a transition from t to (ti ,... ,ti,u, fj+i ,... ,tn) for < u < tj+i 


fGftj + rftjj 1 A^(n|t) 

^ ^ ^^G'(t) + D(t) G'(t) g^(t) 


Ai(M|t) 


so that the birth rate is effectively Xi{u\ti,... ,tn) on Therefore, / is an 

invariant density. An appeal to [25] implies the existence of a unique jump process 
with the given birth and death rates; / is its unique invariant probability density and 
the process converges to / in distribution from any t for which /(t) > 0. 


Renewal process As an example, consider a renewal process with Erlang inter¬ 
arrival times. To derive a local bound on the conditional intensity, note that for 
a < ^ < b, 

7r(^ - a)7r(& - Q ^ 

nilj — a) “ (a — 1)! \ 4 / 

^ A-* _ .o-i Et'o A-(t-0‘A! 

l-F,(b-a) (a-1)!'^ EE,'A‘(6 - o)Vi! 


and 













is bounded from above by 
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A"(6 —a)" ^ ^ X{b — a) * 

(a-^ L 4 J 

The last bound follows from the fact that 

{^-ar-\b-0^ = (e-0(e-a)r 

'(h _ a')2]* 

< (6 - = (5 _ a)“-^+*4-h 

The bound can be improved upon in some cases by noting that the function 

0 : ^ ^ 7r(^ - a)(l - F^{b - 0) 

increases on (a, 6) if A — (o; — l)/(6 — a) < 0. Finally, since the sub-interval length 
b — a <T is bounded, stability follows. 



